Search Result

Select

Deceptive review detection via hierarchical neural network model with attention mechanism

YAN Mengxiang, JI Donghong, REN Yafeng

Journal of Computer Applications 2019, 39 (7): 1925-1930. DOI: 10.11772/j.issn.1001-9081.2018112340

Abstract （483）

PDF （958KB）（403）

Save

Concerning the problem that traditional discrete models fail to capture global semantic information of whole comment text in deceptive review detection, a hierarchical neural network model with attention mechanism was proposed. Firstly, different neural network models were adopted to model the structure of text, and which model was able to obtain the best semantic representation was discussed. Then, the review was modeled by two attention mechanisms respectively based on user view and product view. The user view focused on the user's preferences in comment text and the product view focused on the product feature in comment text. Finally, two representations learned from user and product views were combined as final semantic representation for deceptive review detection. The experiments were carried out on Yelp dataset with accuracy as the evaluation indicator. The experimental results show that the proposed hierarchical neural network model with attention mechanism performs the best with the accuracy higher than traditional discrete methods and existing neural benchmark models by 1 to 4 percentage points.

Reference | Related Articles | Metrics

Select

Product property sentiment analysis based on neural network model

LIU Xinxing, JI Donghong, REN Yafeng

Journal of Computer Applications 2017, 37 (6): 1735-1740. DOI: 10.11772/j.issn.1001-9081.2017.06.1735

Abstract （773）

PDF （897KB）（968）

Save

Concerning the poor results of product property sentiment analysis by the simple neural network model based on word vector, a gated recursive neural network model of integrating discrete features and word vector embedding was proposed. Firstly, the sentences were modeled with direct recurrent graph and the gated recursive neural network model was adopted to complete product property sentiment analysis. Then, the discrete features and word vector embedding were integrated in the gated recursive neural network. Finally, the feature extraction and sentiment analysis were completed in three different task models:pipeline model, joint model and collapsed model. The experiments were done on laptop and restaurant review datasets of SemEval-2014, the macro F1 score was used as the evaluation indicator. Gated recursive neural network model achieved the F1 scores as 48.21% and 62.19%, which were more than ordinary recursive neural network model by nearly 1.5 percentage points. The results indicate that the gated recursive neural network can capture complicated features and enhance the performance on product property sentiment analysis. The proposed neural network model integrated with discrete features and word vector embedding achieved the F1 scores as 49.26% and 63.31%, which are all higher than baseline methods by 0.5 to 1.0 percentage points. The results show that discrete features and word vector embedding can help each other, on the other hand, it's also shown that the neural network model based on only word embedding has the room for improvement. Among the three task models, the pipeline model achieves the highest F1 scores. Thus, it's better to complete feature extraction and sentiment analysis separately.

Reference | Related Articles | Metrics

Select

Twitter text normalization based on unsupervised learning algorithm

DENG Jiayuan, JI Donghong, FEI Chaoqun, REN Yafeng

Journal of Computer Applications 2016, 36 (7): 1887-1892. DOI: 10.11772/j.issn.1001-9081.2016.07.1887

Abstract （722）

PDF （945KB）（327）

Save

Twitter messages contain a large number of nonstandard tokens, created unintentionally or intentionally by people. It is crucial to normalize the nonstandard tokens for various natural language processing applications. In terms of the existing normalization systems which perform poorly, a novel unsupervised normalization system was proposed. First, a standard dictionary was used to determine whether a tweet needs to be normalized or not. Second, a nonstandard token was considered to take 1-to-1 or 1-to- N recovering based on its characteristics. For 1-to- N recovering, the nonstandard token would be divided into multiple possible words using forward and backward search. Third, some normalization candidates were generated for nonstandard tokens among multiple possible words by integrating random walk and spelling checker. Finally, the best normalized twitter could be obtained by taking all the candidates into consideration of n-gram language model. The experimental results on the manual dataset show that the proposed approach obtains F-score of 86.4%, which is 10 percentage points higher than that of current best graph-based random walk algorithm.

Reference | Related Articles | Metrics

Select

Correlation between phrases in advertisement based on recursive autoencoder

HU Qinghui, WEI Shiwei, XIE Zhongqian, REN Yafeng

Journal of Computer Applications 2016, 36 (1): 154-157. DOI: 10.11772/j.issn.1001-9081.2016.01.0154

Abstract （675）

PDF （737KB）（420）

Save

Focusing on the issue that most research results on correlation between advertising phrases stay in the literal level, and can not exploit deep semantic information of the phrases, which limits the performance of the task, a novel method was proposed to calculate the correlation between the phrases by using deep learning technique. Recursive AutoEncoder (RAE) was developed to make full use of semantic information in the word order and phrase, which made the phrase vector contain more deep semantic information, and built the calculating method of correlation under the advertising situation. Specifically, for a given list of a few phrases, reconstruction error was produced by merging the adjacent two elements. Phrase tree, which similar to the Huffman tree, was produced by merging two elements with smallest reconstruction error in turn. Gradient descent and Cosine distance were used to minimize the reconstruction error of phrase tree and measure the correlation between the phrases respectively. The experimental results show that the contribution of the important phrases is increased in the representation of the final phrase vector by introducing weight information, and RAE is more suitable for phrase calculation. The proposed method increases the accuracy by 4.59% and 3.21% respectively compared with LDA (Latent Dirichlet Allocation) and BM25 algorithm under the same condition of 50% recall rate, which proves its effectiveness.

Reference | Related Articles | Metrics